AITopics | safe model

Collaborating Authors

safe model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging Catastrophic Forgetting to Develop Safe Diffusion Models against Malicious Finetuning Jiadong Pan

Neural Information Processing SystemsFeb-18-2026, 06:01:40 GMT

WARNING: This paper contains offensive images generated by models.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Zhejiang Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

d0949cbcec31c09431610553a284f94a-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 17:19:59 GMT

experiment, image generation, malicious fine-tuning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Zhejiang Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

\texttt{R$^\textbf{2}$AI}: Towards Resistant and Resilient AI in an Evolving World

Sun, Youbang, Wang, Xiang, Fu, Jie, Lu, Chaochao, Zhou, Bowen

arXiv.org Artificial IntelligenceSep-9-2025

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose \textit{safe-by-coevolution} as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce \texttt{R$^2$AI} -- \textit{Resistant and Resilient AI} -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. \texttt{R$^2$AI} integrates \textit{fast and slow safe models}, adversarial simulation and verification through a \textit{safety wind tunnel}, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

evolutionary algorithm, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.06786

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Information Technology (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Exclusive: New Research Shows AI Strategically Lying

TIME - TechDec-18-2024, 17:00:00 GMT

For years, computer scientists have worried that advanced artificial intelligence might be difficult to control. A smart enough AI might pretend to comply with the constraints placed upon it by its human creators, only to reveal its dangerous capabilities at a later point. Until this month, these worries have been purely theoretical. Some academics have even dismissed them as science fiction. But a new paper, shared exclusively with TIME ahead of its publication on Wednesday, offers some of the first evidence that today's AIs are capable of this type of deceit. The paper, which describes experiments jointly carried out by the AI company Anthropic and the nonprofit Redwood Research, shows a version of Anthropic's model, Claude, strategically misleading its creators during the training process in order to avoid being modified.

anthropic, claude, experiment, (15 more...)

TIME - Tech

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Randomization Techniques to Mitigate the Risk of Copyright Infringement

Chen, Wei-Ning, Kairouz, Peter, Oh, Sewoong, Xu, Zheng

arXiv.org Artificial IntelligenceAug-21-2024

In this paper, we investigate potential randomization approaches that can complement current practices of input-based methods (such as licensing data and prompt filtering) and output-based methods (such as recitation checker, license checker, and model-based similarity score) for copyright protection. This is motivated by the inherent ambiguity of the rules that determine substantial similarity in copyright precedents. Given that there is no quantifiable measure of substantial similarity that is agreed upon, complementary approaches can potentially further decrease liability. Similar randomized approaches, such as differential privacy, have been successful in mitigating privacy risks. This document focuses on the technical and research perspective on mitigating copyright violation and hence is not confidential. After investigating potential solutions and running numerical experiments, we concluded that using the notion of Near Access-Freeness (NAF) to measure the degree of substantial similarity is challenging, and the standard approach of training a Differentially Private (DP) model costs significantly when used to ensure NAF. Alternative approaches, such as retrieval models, might provide a more controllable scheme for mitigating substantial similarity.

naf guarantee, safe model, substantial similarity, (12 more...)

arXiv.org Artificial Intelligence

2408.13278

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation

Wu, Juan-Ni, Wang, Tong, Tang, Li-Juan, Wu, Hai-Long, Yu, Ru-Qin

arXiv.org Artificial IntelligenceFeb-3-2024

String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragmentbased linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure and encoding logic, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the algorithm of the t-SMILES family provides certain advantages.

algorithm, fragment, molecule, (15 more...)

arXiv.org Artificial Intelligence

2402.02164

Country: Asia > China (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Copyright be Reduced to Privacy?

Elkin-Koren, Niva, Hacohen, Uri, Livni, Roi, Moran, Shay

arXiv.org Artificial IntelligenceMay-24-2023

Recent advancements in Machine Learning have sparked a wave of new possibilities and applications that could potentially transform various aspects of our daily lives and revolutionize numerous professions through automation. However, training such algorithms relies heavily on extensive content, either annotated or generated by individuals who may be impacted by these algorithms. Consequently, the identification and determination of when and how content can be used within this framework without infringing upon individuals' legal rights have become a pressing challenge.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14822

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Industry:

Media (1.00)
Law > Intellectual Property & Technology Law (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback